Matrix Bidiagonalization on the Trident Processor

نویسندگان

Mostafa I. Soliman

Stanislav G. Sedukhin

چکیده

This paper discusses the implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine _GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector, matrix-vector, and matrix operations needed for bidiagonalizing a matrix. The number of clock cycles per FLOP is used as a metric to evaluate the performance of the Trident processor. Our results show that increasing the number of the Trident lanes proportionally decreases the number of cycles needed per FLOP. On a 32K 32K matrix and 128 Trident lanes, the speedup of using matrix-vector operations on the standard Golub and Kahan algorithm is around 1.5 times over using vector operations. However, using matrix operations on the _GEBRD subroutine gives speedup around 3 times over vector operations, and 2 times over using matrix-vector operations on the standard Golub and Kahan algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BLAS on the Trident Processor: Implementation and Performance Evaluation

This paper describes the implementation of the Basic Linear Algebra Subprograms (BLAS), which are widely used in many applications, on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector-vector, matrix-vector, and matrix-matrix operations needed for implementing BLAS. The TFLOPS rate on infinite-size pro...

متن کامل

Trident: A Scalable Architecture for Scalar, Vector, and Matrix Operations

Within a few years it will be possible to integrate a billion transistors on a single chip. At this integration level, we propose using a high level ISA to express parallelism to hardware instead of using a huge transistor budget to dynamically extract it. Since the fundamental data structures for a wide variety of applications are scalar, vector, and matrix, our proposed Trident processor exte...

متن کامل

Bidiagonalization with Parallel Tiled Algorithms

We consider algorithms for going from a “full” matrix to a condensed “band bidiagonal” form using orthogonal transformations. We use the framework of “algorithms by tiles”. Within this framework, we study: (i) the tiled bidiagonalization algorithm BiDiag, which is a tiled version of the standard scalar bidiagonalization algorithm; and (ii) the R-bidiagonalization algorithm R-BiDiag, which is a ...

متن کامل

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...

متن کامل

Divide and Conquer Low-rank Preconditioning Techniques

This paper presents a preconditioning method based on a recursive multilevel lowrank approximation approach. The basic idea is to recursively divide the problem into two and apply a low-rank approximation to a matrix obtained from the Sherman-Morrison formula. The low-rank approximation may be computed by the partial Singular Value Decomposition (SVD) or it can be approximated by the Lanczos bi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Matrix Bidiagonalization on the Trident Processor

نویسندگان

چکیده

منابع مشابه

BLAS on the Trident Processor: Implementation and Performance Evaluation

Trident: A Scalable Architecture for Scalar, Vector, and Matrix Operations

Bidiagonalization with Parallel Tiled Algorithms

Large-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation

Divide and Conquer Low-rank Preconditioning Techniques

عنوان ژورنال:

اشتراک گذاری